Overview

Dataset statistics

Number of variables16
Number of observations512956
Missing cells1269839
Missing cells (%)15.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory66.5 MiB
Average record size in memory136.0 B

Variable types

Categorical8
Numeric7
Unsupported1

Alerts

Open has constant value "1.0" Constant
Date has a high cardinality: 577 distinct values High cardinality
PromoInterval is highly correlated with Promo2 and 1 other fieldsHigh correlation
Promo is highly correlated with OpenHigh correlation
Promo2 is highly correlated with PromoInterval and 1 other fieldsHigh correlation
Assortment is highly correlated with StoreType and 1 other fieldsHigh correlation
StoreType is highly correlated with Assortment and 1 other fieldsHigh correlation
SchoolHoliday is highly correlated with OpenHigh correlation
Open is highly correlated with PromoInterval and 5 other fieldsHigh correlation
StoreType is highly correlated with AssortmentHigh correlation
Assortment is highly correlated with StoreTypeHigh correlation
CompetitionOpenSinceYear is highly correlated with Promo2SinceWeekHigh correlation
Promo2SinceWeek is highly correlated with CompetitionOpenSinceYear and 2 other fieldsHigh correlation
Promo2SinceYear is highly correlated with Promo2SinceWeek and 1 other fieldsHigh correlation
PromoInterval is highly correlated with Promo2SinceWeek and 1 other fieldsHigh correlation
DayOfWeek has 15299 (3.0%) missing values Missing
Open has 15455 (3.0%) missing values Missing
Promo has 15439 (3.0%) missing values Missing
StateHoliday has 15560 (3.0%) missing values Missing
SchoolHoliday has 15547 (3.0%) missing values Missing
StoreType has 15580 (3.0%) missing values Missing
Assortment has 15580 (3.0%) missing values Missing
CompetitionDistance has 16885 (3.3%) missing values Missing
CompetitionOpenSinceMonth has 173647 (33.9%) missing values Missing
CompetitionOpenSinceYear has 173647 (33.9%) missing values Missing
Promo2 has 15580 (3.0%) missing values Missing
Promo2SinceWeek has 260540 (50.8%) missing values Missing
Promo2SinceYear has 260540 (50.8%) missing values Missing
PromoInterval has 260540 (50.8%) missing values Missing
StateHoliday is an unsupported type, check if it needs cleaning or further analysis Unsupported
Store has 15580 (3.0%) zeros Zeros

Reproduction

Analysis started2021-10-30 11:50:21.345409
Analysis finished2021-10-30 11:51:11.445735
Duration50.1 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

Date
Categorical

HIGH CARDINALITY

Distinct577
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size7.8 MiB
2013-11-07
 
1100
2013-12-30
 
1095
2013-09-23
 
1095
2013-10-01
 
1094
2014-06-13
 
1094
Other values (572)
507478 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2013-01-01
2nd row2013-01-01
3rd row2013-01-01
4th row2013-01-01
5th row2013-01-01

Common Values

ValueCountFrequency (%)
2013-11-071100
 
0.2%
2013-12-301095
 
0.2%
2013-09-231095
 
0.2%
2013-10-011094
 
0.2%
2014-06-131094
 
0.2%
2013-12-091093
 
0.2%
2013-08-091092
 
0.2%
2013-09-161092
 
0.2%
2013-12-041092
 
0.2%
2014-03-311091
 
0.2%
Other values (567)502018
97.9%

Length

2021-10-30T13:51:11.567633image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2013-11-071100
 
0.2%
2013-09-231095
 
0.2%
2013-12-301095
 
0.2%
2013-10-011094
 
0.2%
2014-06-131094
 
0.2%
2013-12-091093
 
0.2%
2013-08-091092
 
0.2%
2013-09-161092
 
0.2%
2013-12-041092
 
0.2%
2014-06-231091
 
0.2%
Other values (567)502018
97.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Store
Real number (ℝ≥0)

ZEROS

Distinct1116
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean541.1669422
Minimum0
Maximum1115
Zeros15580
Zeros (%)3.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2021-10-30T13:51:11.759615image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile23
Q1254
median540
Q3828
95-th percentile1058
Maximum1115
Range1115
Interquartile range (IQR)574

Descriptive statistics

Standard deviation330.8831476
Coefficient of variation (CV)0.611425277
Kurtosis-1.208068777
Mean541.1669422
Median Absolute Deviation (MAD)287
Skewness0.008515760983
Sum277594830
Variance109483.6573
MonotonicityNot monotonic
2021-10-30T13:51:11.977397image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
015580
 
3.0%
733551
 
0.1%
335547
 
0.1%
85546
 
0.1%
769545
 
0.1%
494544
 
0.1%
562543
 
0.1%
262541
 
0.1%
1097541
 
0.1%
948538
 
0.1%
Other values (1106)492480
96.0%
ValueCountFrequency (%)
015580
3.0%
1452
 
0.1%
2457
 
0.1%
3448
 
0.1%
4450
 
0.1%
5459
 
0.1%
6451
 
0.1%
7452
 
0.1%
8447
 
0.1%
9454
 
0.1%
ValueCountFrequency (%)
1115455
0.1%
1114449
0.1%
1113458
0.1%
1112449
0.1%
1111452
0.1%
1110447
0.1%
1109421
0.1%
1108455
0.1%
1107418
0.1%
1106449
0.1%

DayOfWeek
Real number (ℝ≥0)

MISSING

Distinct7
Distinct (%)< 0.1%
Missing15299
Missing (%)3.0%
Infinite0
Infinite (%)0.0%
Mean3.525148847
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2021-10-30T13:51:12.161526image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q35
95-th percentile6
Maximum7
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.724263559
Coefficient of variation (CV)0.4891321285
Kurtosis-1.262780303
Mean3.525148847
Median Absolute Deviation (MAD)2
Skewness0.01405446637
Sum1754315
Variance2.973084821
MonotonicityNot monotonic
2021-10-30T13:51:12.307942image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
685139
16.6%
284995
16.6%
382859
16.2%
582472
16.1%
180809
15.8%
479312
15.5%
72071
 
0.4%
(Missing)15299
 
3.0%
ValueCountFrequency (%)
180809
15.8%
284995
16.6%
382859
16.2%
479312
15.5%
582472
16.1%
685139
16.6%
72071
 
0.4%
ValueCountFrequency (%)
72071
 
0.4%
685139
16.6%
582472
16.1%
479312
15.5%
382859
16.2%
284995
16.6%
180809
15.8%

Open
Categorical

CONSTANT
HIGH CORRELATION
MISSING
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing15455
Missing (%)3.0%
Memory size7.8 MiB
1.0
497501 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0497501
97.0%
(Missing)15455
 
3.0%

Length

2021-10-30T13:51:12.475265image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-30T13:51:12.586427image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
1.0497501
100.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Promo
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing15439
Missing (%)3.0%
Memory size7.8 MiB
0.0
281859 
1.0
215658 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0281859
54.9%
1.0215658
42.0%
(Missing)15439
 
3.0%

Length

2021-10-30T13:51:12.694100image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-30T13:51:12.807359image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
0.0281859
56.7%
1.0215658
43.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

StateHoliday
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing15560
Missing (%)3.0%
Memory size7.8 MiB

SchoolHoliday
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing15547
Missing (%)3.0%
Memory size7.8 MiB
0.0
404398 
1.0
93011 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
0.0404398
78.8%
1.093011
 
18.1%
(Missing)15547
 
3.0%

Length

2021-10-30T13:51:12.913206image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-30T13:51:13.010926image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
0.0404398
81.3%
1.093011
 
18.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

StoreType
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct4
Distinct (%)< 0.1%
Missing15580
Missing (%)3.0%
Memory size7.8 MiB
a
268425 
d
154104 
c
65902 
b
 
8945

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowb
2nd rowb
3rd rowb
4th rowb
5th rowa

Common Values

ValueCountFrequency (%)
a268425
52.3%
d154104
30.0%
c65902
 
12.8%
b8945
 
1.7%
(Missing)15580
 
3.0%

Length

2021-10-30T13:51:13.109788image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-30T13:51:13.211142image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
a268425
54.0%
d154104
31.0%
c65902
 
13.2%
b8945
 
1.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Assortment
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct3
Distinct (%)< 0.1%
Missing15580
Missing (%)3.0%
Memory size7.8 MiB
a
263333 
c
229263 
b
 
4780

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowb
2nd rowa
3rd rowb
4th rowa
5th rowc

Common Values

ValueCountFrequency (%)
a263333
51.3%
c229263
44.7%
b4780
 
0.9%
(Missing)15580
 
3.0%

Length

2021-10-30T13:51:13.325696image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-30T13:51:13.441794image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
a263333
52.9%
c229263
46.1%
b4780
 
1.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

CompetitionDistance
Real number (ℝ≥0)

MISSING

Distinct654
Distinct (%)0.1%
Missing16885
Missing (%)3.3%
Infinite0
Infinite (%)0.0%
Mean5442.092301
Minimum20
Maximum75860
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2021-10-30T13:51:13.583865image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum20
5-th percentile140
Q1720
median2320
Q36890
95-th percentile20390
Maximum75860
Range75840
Interquartile range (IQR)6170

Descriptive statistics

Standard deviation7770.30909
Coefficient of variation (CV)1.427816483
Kurtosis13.41381819
Mean5442.092301
Median Absolute Deviation (MAD)1970
Skewness2.971593832
Sum2699664170
Variance60377703.36
MonotonicityNot monotonic
2021-10-30T13:51:13.805279image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2505365
 
1.0%
12003859
 
0.8%
503614
 
0.7%
3503590
 
0.7%
1903550
 
0.7%
903215
 
0.6%
1503170
 
0.6%
3303105
 
0.6%
1803096
 
0.6%
1402706
 
0.5%
Other values (644)460801
89.8%
(Missing)16885
 
3.3%
ValueCountFrequency (%)
20440
 
0.1%
301785
0.3%
402236
0.4%
503614
0.7%
601351
 
0.3%
702210
0.4%
801353
 
0.3%
903215
0.6%
1002238
0.4%
1102670
0.5%
ValueCountFrequency (%)
75860507
0.1%
58260513
0.1%
48330460
0.1%
46590463
0.1%
45740440
0.1%
44320454
0.1%
40860502
0.1%
40540457
0.1%
38710450
0.1%
38630504
0.1%

CompetitionOpenSinceMonth
Real number (ℝ≥0)

MISSING

Distinct12
Distinct (%)< 0.1%
Missing173647
Missing (%)33.9%
Infinite0
Infinite (%)0.0%
Mean7.227197039
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2021-10-30T13:51:14.186836image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q14
median8
Q310
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.20882083
Coefficient of variation (CV)0.4439924375
Kurtosis-1.242943338
Mean7.227197039
Median Absolute Deviation (MAD)3
Skewness-0.172440821
Sum2452253
Variance10.29653112
MonotonicityNot monotonic
2021-10-30T13:51:14.322585image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
956176
 
11.0%
442197
 
8.2%
1141060
 
8.0%
331059
 
6.1%
729466
 
5.7%
1228389
 
5.5%
1027280
 
5.3%
622335
 
4.4%
519478
 
3.8%
218287
 
3.6%
Other values (2)23582
 
4.6%
(Missing)173647
33.9%
ValueCountFrequency (%)
16188
 
1.2%
218287
 
3.6%
331059
6.1%
442197
8.2%
519478
 
3.8%
622335
 
4.4%
729466
5.7%
817394
 
3.4%
956176
11.0%
1027280
5.3%
ValueCountFrequency (%)
1228389
5.5%
1141060
8.0%
1027280
5.3%
956176
11.0%
817394
 
3.4%
729466
5.7%
622335
 
4.4%
519478
 
3.8%
442197
8.2%
331059
6.1%

CompetitionOpenSinceYear
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct23
Distinct (%)< 0.1%
Missing173647
Missing (%)33.9%
Infinite0
Infinite (%)0.0%
Mean2008.677733
Minimum1900
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2021-10-30T13:51:14.486789image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1900
5-th percentile2001
Q12006
median2010
Q32013
95-th percentile2014
Maximum2015
Range115
Interquartile range (IQR)7

Descriptive statistics

Standard deviation6.128183346
Coefficient of variation (CV)0.003050854423
Kurtosis126.0468269
Mean2008.677733
Median Absolute Deviation (MAD)3
Skewness-7.872277114
Sum681562433
Variance37.55463113
MonotonicityNot monotonic
2021-10-30T13:51:14.688736image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
201337240
 
7.3%
201236513
 
7.1%
201431138
 
6.1%
200527532
 
5.4%
201024765
 
4.8%
201124385
 
4.8%
200923957
 
4.7%
200823947
 
4.7%
200721330
 
4.2%
200620943
 
4.1%
Other values (13)67559
 
13.2%
(Missing)173647
33.9%
ValueCountFrequency (%)
1900422
 
0.1%
1961453
 
0.1%
19902231
 
0.4%
1994891
 
0.2%
1995880
 
0.2%
1998442
 
0.1%
19993638
 
0.7%
20004465
 
0.9%
20017139
1.4%
200212141
2.4%
ValueCountFrequency (%)
201516770
3.3%
201431138
6.1%
201337240
7.3%
201236513
7.1%
201124385
4.8%
201024765
4.8%
200923957
4.7%
200823947
4.7%
200721330
4.2%
200620943
4.1%

Promo2
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing15580
Missing (%)3.0%
Memory size7.8 MiB
1.0
252416 
0.0
244960 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
1.0252416
49.2%
0.0244960
47.8%
(Missing)15580
 
3.0%

Length

2021-10-30T13:51:14.859628image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-30T13:51:14.965509image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
1.0252416
50.7%
0.0244960
49.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Promo2SinceWeek
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct24
Distinct (%)< 0.1%
Missing260540
Missing (%)50.8%
Infinite0
Infinite (%)0.0%
Mean23.49036511
Minimum1
Maximum50
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2021-10-30T13:51:15.066071image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q113
median22
Q337
95-th percentile45
Maximum50
Range49
Interquartile range (IQR)24

Descriptive statistics

Standard deviation14.12463511
Coefficient of variation (CV)0.6012948305
Kurtosis-1.3822398
Mean23.49036511
Median Absolute Deviation (MAD)13
Skewness0.08477180709
Sum5929344
Variance199.5053169
MonotonicityNot monotonic
2021-10-30T13:51:15.257753image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
1436053
 
7.0%
4033066
 
6.4%
3119647
 
3.8%
1018745
 
3.7%
517468
 
3.4%
3715657
 
3.1%
115637
 
3.0%
1314934
 
2.9%
4514790
 
2.9%
2214473
 
2.8%
Other values (14)51946
 
10.1%
(Missing)260540
50.8%
ValueCountFrequency (%)
115637
3.0%
517468
3.4%
6447
 
0.1%
96205
 
1.2%
1018745
3.7%
1314934
2.9%
1436053
7.0%
1812946
 
2.5%
2214473
2.8%
232154
 
0.4%
ValueCountFrequency (%)
50451
 
0.1%
49415
 
0.1%
484147
 
0.8%
4514790
2.9%
441328
 
0.3%
4033066
6.4%
392557
 
0.5%
3715657
3.1%
364431
 
0.9%
3511127
 
2.2%

Promo2SinceYear
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct7
Distinct (%)< 0.1%
Missing260540
Missing (%)50.8%
Infinite0
Infinite (%)0.0%
Mean2011.761014
Minimum2009
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2021-10-30T13:51:15.432961image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum2009
5-th percentile2009
Q12011
median2012
Q32013
95-th percentile2014
Maximum2015
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.668102497
Coefficient of variation (CV)0.0008291752779
Kurtosis-1.052532275
Mean2011.761014
Median Absolute Deviation (MAD)1
Skewness-0.1199145751
Sum507800668
Variance2.782565942
MonotonicityNot monotonic
2021-10-30T13:51:15.576270image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
201156774
 
11.1%
201353661
 
10.5%
201441116
 
8.0%
201236119
 
7.0%
200932141
 
6.3%
201028167
 
5.5%
20154438
 
0.9%
(Missing)260540
50.8%
ValueCountFrequency (%)
200932141
6.3%
201028167
5.5%
201156774
11.1%
201236119
7.0%
201353661
10.5%
201441116
8.0%
20154438
 
0.9%
ValueCountFrequency (%)
20154438
 
0.9%
201441116
8.0%
201353661
10.5%
201236119
7.0%
201156774
11.1%
201028167
5.5%
200932141
6.3%

PromoInterval
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct3
Distinct (%)< 0.1%
Missing260540
Missing (%)50.8%
Memory size7.8 MiB
Jan,Apr,Jul,Oct
147410 
Feb,May,Aug,Nov
57758 
Mar,Jun,Sept,Dec
47248 

Length

Max length16
Median length15
Mean length15.18718306
Min length15

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFeb,May,Aug,Nov
2nd rowJan,Apr,Jul,Oct
3rd rowMar,Jun,Sept,Dec
4th rowJan,Apr,Jul,Oct
5th rowFeb,May,Aug,Nov

Common Values

ValueCountFrequency (%)
Jan,Apr,Jul,Oct147410
28.7%
Feb,May,Aug,Nov57758
 
11.3%
Mar,Jun,Sept,Dec47248
 
9.2%
(Missing)260540
50.8%

Length

2021-10-30T13:51:15.767585image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-30T13:51:15.887541image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
jan,apr,jul,oct147410
58.4%
feb,may,aug,nov57758
 
22.9%
mar,jun,sept,dec47248
 
18.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Interactions

2021-10-30T13:51:03.402417image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:51.449631image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:53.477110image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:55.589081image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:57.622960image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:59.585652image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:51:01.583271image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:51:03.642661image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:51.774241image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:53.780726image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:55.920142image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:57.877070image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:59.850475image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:51:01.838270image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:51:03.922825image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:52.085838image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:54.147312image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:56.252572image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:58.152450image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:51:00.163733image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:51:02.104680image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:51:04.156205image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:52.344980image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:54.463317image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:56.552192image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:58.428003image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:51:00.490177image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:51:02.332230image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:51:04.385342image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:52.619311image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:54.729296image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:56.819773image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:58.835459image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:51:00.785695image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:51:02.546660image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:51:04.653310image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:52.872152image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:54.979935image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:57.072657image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:59.081497image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:51:01.031826image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:51:02.788125image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:51:04.934166image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:53.156454image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:55.217662image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:57.340262image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:50:59.310766image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:51:01.290660image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T13:51:03.089558image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2021-10-30T13:51:15.999376image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-10-30T13:51:16.319913image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-10-30T13:51:16.622256image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-10-30T13:51:16.923119image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2021-10-30T13:51:17.170689image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-10-30T13:51:05.632115image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2021-10-30T13:51:06.780564image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-10-30T13:51:10.245226image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-10-30T13:51:10.894339image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

DateStoreDayOfWeekOpenPromoStateHolidaySchoolHolidayStoreTypeAssortmentCompetitionDistanceCompetitionOpenSinceMonthCompetitionOpenSinceYearPromo2Promo2SinceWeekPromo2SinceYearPromoInterval
02013-01-013532.01.00.0a1.0bb900.0NaNNaN1.014.02013.0Feb,May,Aug,Nov
12013-01-013352.01.00.0a1.0ba90.0NaNNaN1.031.02013.0Jan,Apr,Jul,Oct
22013-01-015122.01.00.0a1.0bb590.0NaNNaN1.05.02013.0Mar,Jun,Sept,Dec
32013-01-014942.01.00.0a1.0ba1260.06.02011.00.0NaNNaNNaN
42013-01-015302.01.00.0a1.0ac18160.0NaNNaN0.0NaNNaNNaN
52013-01-014232.01.00.0a1.0ba1270.05.02014.00.0NaNNaNNaN
62013-01-01852.01.0NaNa1.0ba1870.010.02011.00.0NaNNaNNaN
72013-01-012742.01.00.0a1.0bb3640.0NaNNaN1.010.02013.0Jan,Apr,Jul,Oct
82013-01-012622.01.00.0a1.0ba1180.05.02013.00.0NaNNaNNaN
92013-01-012592.01.00.0a1.0bb210.0NaNNaN0.0NaNNaNNaN

Last rows

DateStoreDayOfWeekOpenPromoStateHolidaySchoolHolidayStoreTypeAssortmentCompetitionDistanceCompetitionOpenSinceMonthCompetitionOpenSinceYearPromo2Promo2SinceWeekPromo2SinceYearPromoInterval
5129462014-07-317464.01.01.000.0dc4330.02.02011.01.035.02011.0Mar,Jun,Sept,Dec
5129472014-07-317474.01.01.001.0cc45740.08.02008.00.0NaNNaNNaN
5129482014-07-317484.01.01.001.0da2380.03.02010.01.014.02011.0Jan,Apr,Jul,Oct
5129492014-07-317494.01.0NaN01.0aa3410.08.02011.01.014.02015.0Jan,Apr,Jul,Oct
5129502014-07-317434.01.01.001.0aa6710.011.02003.01.014.02012.0Jan,Apr,Jul,Oct
5129512014-07-317524.01.01.001.0aa970.03.02013.01.031.02013.0Feb,May,Aug,Nov
5129522014-07-31753NaN1.01.001.0dc540.011.02012.01.035.02010.0Mar,Jun,Sept,Dec
5129532014-07-317544.01.01.00NaNcc380.05.02008.01.010.02014.0Mar,Jun,Sept,Dec
5129542014-07-317554.01.01.001.0dc13130.012.02003.00.0NaNNaNNaN
5129552014-07-317514.01.01.001.0aa650.010.02006.00.0NaNNaNNaN